11 research outputs found

    Exploring the Limitations of Behavior Cloning for Autonomous Driving

    Get PDF
    Driving requires reacting to a wide variety of complex environment conditions and agent behaviors. Explicitly modeling each possible scenario is unrealistic. In contrast, imitation learning can, in theory, leverage data from large fleets of human-driven cars. Behavior cloning in particular has been successfully used to learn simple visuomotor policies end-to-end, but scaling to the full spectrum of driving behaviors remains an unsolved problem. In this paper, we propose a new benchmark to experimentally investigate the scalability and limitations of behavior cloning. We show that behavior cloning leads to state-of-the-art results, including in unseen environments, executing complex lateral and longitudinal maneuvers without these reactions being explicitly programmed. However, we confirm well-known limitations (due to dataset bias and overfitting), new generalization issues (due to dynamic objects and the lack of a causal model), and training instability requiring further research before behavior cloning can graduate to real-world driving. The code of the studied behavior cloning approaches can be found at https://github.com/felipecode/coiltraine

    On Offline Evaluation of Vision-based Driving Models

    Get PDF
    Autonomous driving models should ideally be evaluated by deploying them on a fleet of physical vehicles in the real world. Unfortunately, this approach is not practical for the vast majority of researchers. An attractive alternative is to evaluate models offline, on a pre-collected validation dataset with ground truth annotation. In this paper, we investigate the relation between various online and offline metrics for evaluation of autonomous driving models. We find that offline prediction error is not necessarily correlated with driving quality, and two models with identical prediction error can differ dramatically in their driving performance. We show that the correlation of offline evaluation with driving quality can be significantly improved by selecting an appropriate validation dataset and suitable offline metrics. The supplementary video can be viewed at https://www.youtube.com/watch?v=P8K8Z-iF0cYComment: Published at the ECCV 2018 conferenc

    End-to-end Driving via Conditional Imitation Learning

    Get PDF
    Deep networks trained on demonstrations of human driving have learned to follow roads and avoid obstacles. However, driving policies trained via imitation learning cannot be controlled at test time. A vehicle trained end-to-end to imitate an expert cannot be guided to take a specific turn at an upcoming intersection. This limits the utility of such systems. We propose to condition imitation learning on high-level command input. At test time, the learned driving policy functions as a chauffeur that handles sensorimotor coordination but continues to respond to navigational commands. We evaluate different architectures for conditional imitation learning in vision-based driving. We conduct experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area. Both systems drive based on visual input yet remain responsive to high-level navigational commands. The supplementary video can be viewed at https://youtu.be/cFtnflNe5fMComment: Published at the International Conference on Robotics and Automation (ICRA), 201

    Scaling Vision-based End-to-End Driving with Multi-View Attention Learning

    Full text link
    On end-to-end driving, human driving demonstrations are used to train perception-based driving models by imitation learning. This process is supervised on vehicle signals (e.g., steering angle, acceleration) but does not require extra costly supervision (human labeling of sensor data). As a representative of such vision-based end-to-end driving models, CILRS is commonly used as a baseline to compare with new driving models. So far, some latest models achieve better performance than CILRS by using expensive sensor suites and/or by using large amounts of human-labeled data for training. Given the difference in performance, one may think that it is not worth pursuing vision-based pure end-to-end driving. However, we argue that this approach still has great value and potential considering cost and maintenance. In this paper, we present CIL++, which improves on CILRS by both processing higher-resolution images using a human-inspired HFOV as an inductive bias and incorporating a proper attention mechanism. CIL++ achieves competitive performance compared to models which are more costly to develop. We propose to replace CILRS with CIL++ as a strong vision-based pure end-to-end driving baseline supervised by only vehicle signals and trained by conditional imitation learning.Comment: This paper has been accepted to the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023

    Autobots: Latent Variable Sequential Set Transformers

    Full text link
    Robust multi-agent trajectory prediction is essential for the safe control of robots and vehicles that interact with humans. Many existing methods treat social and temporal information separately and therefore fall short of modelling the joint future trajectories of all agents in a socially consistent way. To address this, we propose a new class of Latent Variable Sequential Set Transformers which autoregressively model multi-agent trajectories. We refer to these architectures as "AutoBots". AutoBots model the contents of sets (e.g. representing the properties of agents in a scene) over time and employ multi-head self-attention blocks over these sequences of sets to encode the sociotemporal relationships between the different actors of a scene. This produces either the trajectory of one ego-agent or a distribution over the future trajectories for all agents under consideration. Our approach works for general sequences of sets and we provide illustrative experiments modelling the sequential structure of the multiple strokes that make up symbols in the Omniglot data. For the single-agent prediction case, we validate our model on the NuScenes motion prediction task and achieve competitive results on the global leaderboard. In the multi-agent forecasting setting, we validate our model on TrajNet. We find that our method outperforms physical extrapolation and recurrent network baselines and generates scene-consistent trajectories.Comment: 21 pages, 15 figures, 5 table

    On building end-to-end driving models through imitation learning

    Get PDF
    Els vehicles aut貌noms es consideren ara com a actius assegurats en el futur. Literalment, tots els marcadors d'autom貌bils rellevants es troben en una cursa per produir vehicles totalment aut貌noms. Aquests fabricants de cotxes solen fer 煤s de canonades modulars per al disseny de vehicles aut貌noms. Aquesta estrat猫gia descompon el problema en diverses tasques com la detecci贸 i el reconeixement d'objectes, la segmentaci贸 sem脿ntica i la inst脿ncia, l'estimaci贸 de profunditat, el reconeixement de llocs i SLAM, aix铆 com la planificaci贸 i el control. Cada m貌dul requereix un conjunt separat d'algoritmes experts, que s贸n costosos especialment quant al treball hum脿 i la necessitat d'etiquetatge de dades. Una alternativa que recentment t茅 un inter猫s significatiu 茅s la conducci贸 integral. En el paradigma de conducci贸 de extrem a extrem, la percepci贸 i el control s'obtenen simult脿niament mitjan莽ant una xarxa profunda. Els models de tesisensorotor s'obtenen normalment mitjan莽ant l'aprenentatge de imitacions de les demostracions de hum脿. L'avantatge principal 茅s que aquest enfocament pot aprendre directament de les grans flotes de vehicles dirigits per humans sense necessitat d'un ontologia fixa i d'una 脿mplia quantitat d'etiquetatge. No obstant aix貌, els m猫todes de extrem a extrem es van utilitzar habitualment per aprendre conductes simples com ara manteniment de carrils i el vehicle principal. En aquesta tesi, per tal d'aconseguir comportaments m茅s complexos, abordemalguns problemes quan es crea un sistema de conducci贸 de extrem a extrem mitjan莽ant l'aprenentatge de la imitaci贸. El primer d'aquests 茅s la necessitat d'un entorn per a l'avaluaci贸 d'algorismes i la recopilaci贸 de demostracions d'administraci贸. En aquest sentit, hem participat en la creaci贸 del simulador de Carla, una plataforma de codi obert constru茂da des de la base per a la validaci贸 i el prototipatge d'aut貌noms. At猫s que l'enfocament de extrem a extrem 茅s purament re-actiu, tamb茅 hi ha la necessitat de proporcionar una interf铆cie amb un sistema de planificaci贸 global. Amb aix貌, proposem l'aprenentatge d'imitaci贸 condicional que condiciona les accions produ茂des en algun comandament d'alt nivell. L'avaluaci贸 茅s tamb茅 una q眉esti贸 i normalment es fa mitjan莽ant la comparaci贸 de la sortida de la xarxa de cap a cap a un conjunt de dades de conducci贸 que es recull. Demostrem que aix貌 茅s correlacionat sorprenentment debilitat amb la conducci贸 real i proposa estrat猫gies sobre com adquirir millor les dades i una estrat猫gia de comparaci贸 millor. Finalment, confirmem problemes de generalitzaci贸 ben coneguts (deguts a biaixos i sobraccessos actuals), de nous (a causa d'objectes din脿mics i la manca de model acausal) i la inestabilitat de la formaci贸; Els problemes que requereixen m茅s investigacions abans de finalitzar la conducci贸 a trav茅s de la imitaci贸 poden escalar a la conducci贸 del m贸n real.Autonomous vehicles are now considered as an assured asset in the future. Literally, all the relevant car-markers are now in a race to produce fully autonomous vehicles. These car-makers usually make use of modular pipelines for designing autonomous vehicles. This strategy decomposes the problemin a variety of tasks such as object detection and recognition, semantic and instance segmentation, depth estimation, SLAM and place recognition, as well as planning and control. Each module requires a separate set of expert algorithms, which are costly specially in the amount of human labor and necessity of data labelling. An alternative that recently has driven considerable interest is the end-to-end driving. In the end-to-end driving paradigm, perception and control are learned simultaneously using a deep network. These sensorimotor models are typically obtained by imitation learning from human demonstrations. The main advantage is that this approach can directly learn from large fleets of human-driven vehicles without requiring a fixed ontology and extensive amounts of labeling. However, scaling end-to-end driving methods to behaviors more complex than simple lane keeping or lead vehicle following remains an open problem. On this thesis, in order to achieve more complex behaviours, we address some issues when creating end-to-end driving system through imitation learning. The first of themis a necessity of an environment for algorithm evaluation and collection of driving demonstrations. On this matter, we participated on the creation of the CARLA simulator, an open source platformbuilt from ground up for autonomous driving validation and prototyping. Since the end-to-end approach is purely reactive, there is also the necessity to provide an interface with a global planning system. With this, we propose the conditional imitation learning that conditions the actions produced into some high level command. Evaluation is also a concern and is commonly performed by comparing the end-to-end network output to some pre-collected driving dataset. We show that this is surprisingly weakly correlated to the actual driving and propose strategies on how to better acquire data and a better comparison strategy. Finally, we confirmwell-known generalization issues (due to dataset bias and overfitting), new ones (due to dynamic objects and the lack of a causal model), and training instability; problems requiring further research before end-to-end driving through imitation can scale to real-world driving

    Multimodal end-to-end autonomous driving

    No full text
    Altres ajuts: Antonio M. Lopez acknowledges the financial support by ICREA under the ICREA Academia Program. We also thank the Generalitat de Catalunya CERCA Program, as well as its ACCIO agencyA crucial component of an autonomous vehicle (AV) is the artificial intelligence (AI) is able to drive towards a desired destination. Today, there are different paradigms addressing the development of AI drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception and maneuver planning and control. On the other hand, we find end-to-end driving approaches that try to learn a direct mapping from input raw sensor data to vehicle control signals. The later are relatively less studied, but are gaining popularity since they are less demanding in terms of sensor data annotation. This paper focuses on end-to-end autonomous driving. So far, most proposals relying on this paradigm assume RGB images as input sensor data. However, AVs will not be equipped only with cameras, but also with active sensors providing accurate depth information (e.g. , LiDARs). Accordingly, this paper analyses whether combining RGB and depth modalities, i.e. using RGBD data, produces better end-to-end AI drivers than relying on a single modality. We consider multimodality based on early, mid and late fusion schemes, both in multisensory and single-sensor (monocular depth estimation) settings. Using the CARLA simulator and conditional imitation learning (CIL), we show how, indeed, early fusion multimodality outperforms single-modality
    corecore